Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 116
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388681

RESUMO

MOTIVATION: Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS: We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY: An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT: dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Journal Name online.


Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Idioma , Análise de Sequência de RNA/métodos
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38366803

RESUMO

The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.


Assuntos
Algoritmos , Análise da Expressão Gênica de Célula Única , Benchmarking , Entropia , Biblioteca Gênica , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Análise por Conglomerados
3.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38279647

RESUMO

MOTIVATION: The rapid development of spatial transcriptome technologies has enabled researchers to acquire single-cell-level spatial data at an affordable price. However, computational analysis tools, such as annotation tools, tailored for these data are still lacking. Recently, many computational frameworks have emerged to integrate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics datasets. While some frameworks can utilize well-annotated scRNA-seq data to annotate spatial expression patterns, they overlook critical aspects. First, existing tools do not explicitly consider cell type mapping when aligning the two modalities. Second, current frameworks lack the capability to detect novel cells, which remains a key interest for biologists. RESULTS: To address these problems, we propose an annotation method for spatial transcriptome data called SPANN. The main tasks of SPANN are to transfer cell-type labels from well-annotated scRNA-seq data to newly generated single-cell resolution spatial transcriptome data and discover novel cells from spatial data. The major innovations of SPANN come from two aspects: SPANN automatically detects novel cells from unseen cell types while maintaining high annotation accuracy over known cell types. SPANN finds a mapping between spatial transcriptome samples and RNA data prototypes and thus conducts cell-type-level alignment. Comprehensive experiments using datasets from various spatial platforms demonstrate SPANN's capabilities in annotating known cell types and discovering novel cell states within complex tissue contexts. AVAILABILITY: The source code of SPANN can be accessed at https://github.com/ddb-qiwang/SPANN-torch. CONTACT: dengmh@math.pku.edu.cn.


Assuntos
Análise da Expressão Gênica de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Software
4.
Adv Healthc Mater ; 13(3): e2302117, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37922499

RESUMO

Prostate-specific antigen (PSA) is the common serum-relevant biomarker for early prostate cancer (PCa) detection in clinical diagnosis. However, it is difficult to accurately diagnose PCa in the early stage due to the low specificity of PSA. Herein, a new solution-gated graphene field transistor (SGGT) biosensor with dual-gate for dual-biomarker detection is designed. The sensing mechanism is that the designed aptamers immobilized on the surface of the gate electrodes can capture PSA and sarcosine (SAR) biomolecules and induce the capacitance changes of the electric double layers of SGGT. The limit of detections of PSA and SAR biomarkers can reach 0.01 fg mL-1 , which is three-to-four orders of magnitude lower than previously reported assays. The detection time of PSA and SAR is ≈4.5 and ≈13 min, which is significantly faster than the detection time (1-2 h) of conventional methods. The clinical serum samples testing demonstrates that the biosensor can distinguish the PCa patients from the control group and the diagnosis accuracy can reach 100%. The SGGT biosensor can be integrated into the portable platform and the diagnostic results can directly display on the smartphone/Pad. Therefore, the integrated portable platform of the biosensor can distinguish cancer types through the dual-biomarker detection.


Assuntos
Técnicas Biossensoriais , Grafite , Neoplasias da Próstata , Masculino , Humanos , Antígeno Prostático Específico , Neoplasias da Próstata/diagnóstico , Eletrodos , Técnicas Biossensoriais/métodos
5.
Int J Mol Sci ; 24(23)2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-38068885

RESUMO

Carotenoids are important pigments in pepper fruits. The colors of each pepper are mainly determined by the composition and content of carotenoid. The 'ZY' variety, which has yellow fruit, is a natural mutant derived from a branch mutant of 'ZR' with different colors. ZY and ZR exhibit obvious differences in fruit color, but no other obvious differences in other traits. To investigate the main reasons for the formation of different colored pepper fruits, transcriptome and metabolome analyses were performed in three developmental stages (S1-S3) in two cultivars. The results revealed that these structural genes (PSY1, CRTISO, CCD1, CYP97C1, VDE1, CCS, NCED1 and NCED2) related to carotenoid biosynthesis were expressed differentially in the two cultivars. Capsanthin and capsorubin mainly accumulated in ZR and were almost non-existent in ZY. S2 is the fruit color-changing stage; this may be a critical period for the development of different color formation of ZY and ZR. A combination of transcriptome and metabolome analyses indicated that CCS, NCED2, AAO4, VDE1 and CYP97C1 genes were key to the differences in the total carotenoid content. These new insights into pepper fruit coloration may help to improve fruit breeding strategies.


Assuntos
Carotenoides , Melhoramento Vegetal , Carotenoides/metabolismo , Perfilação da Expressão Gênica , Frutas/metabolismo , Transcriptoma , Metaboloma , Regulação da Expressão Gênica de Plantas
6.
Anal Chem ; 95(48): 17750-17758, 2023 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971943

RESUMO

A new type of carbon dot (CD)-functionalized solution-gated graphene transistor (SGGT) sensor was designed and fabricated for the highly sensitive and highly selective detection of glutathione (GSH). The CDs were synthesized via a one-step hydrothermal method using DL-thioctic acid and triethylenetetramine (TETA) as sources of S, N, and C. The CDs have abundant amino and carboxyl groups and were used to modify the surface of the gate electrode of SGGT as probes for detecting GSH. Remarkably, the CDs-SGGT sensor exhibited excellent selectivity and ultrahigh sensitivity to GSH, with an ultralow limit of detection (LOD) of up to 10-19 M. To the best of our knowledge, the sensor outperforms previously reported systems. Moreover, the CDs-SGGT sensor shows rapid detection and good stability. More importantly, the detection of GSH in artificial serum samples was successfully demonstrated.


Assuntos
Grafite , Pontos Quânticos , Carbono , Limite de Detecção , Glutationa
7.
Genome Res ; 33(10): 1788-1805, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37827697

RESUMO

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performances is yet to be conducted. To fill this gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and five ligand/receptor-target inference methods using a total of 116 data sets, including 15 ST data sets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data, and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, whereas stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulation using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.


Assuntos
Análise de Célula Única , Software , Ligantes , Análise de Célula Única/métodos , Algoritmos , Comunicação Celular/genética , Análise de Sequência de RNA/métodos
8.
Genes (Basel) ; 14(9)2023 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-37761877

RESUMO

Plant homeodomain (PHD) transcription factor genes are involved in plant development and in a plant's response to stress. However, there are few reports about this gene family in peppers (Capsicum annuum L.). In this study, the pepper inbred line "Zunla-1" was used as the reference genome, and a total of 43 PHD genes were identified, and systematic analysis was performed to study the chromosomal location, evolutionary relationship, gene structure, domains, and upstream cis-regulatory elements of the CaPHD genes. The fewest CaPHD genes were located on chromosome 4, while the most were on chromosome 3. Genes with similar gene structures and domains were clustered together. Expression analysis showed that the expression of CaPHD genes was quite different in different tissues and in response to various stress treatments. The expression of CaPHD17 was different in the early stage of flower bud development in the near-isogenic cytoplasmic male-sterile inbred and the maintainer inbred lines. It is speculated that this gene is involved in the development of male sterility in pepper. CaPHD37 was significantly upregulated in leaves and roots after heat stress, and it is speculated that CaPHD37 plays an important role in tolerating heat stress in pepper; in addition, CaPHD9, CaPHD10, CaPHD11, CaPHD17, CaPHD19, CaPHD20, and CaPHD43 were not sensitive to abiotic stress or hormonal factors. This study will provide the basis for further research into the function of CaPHD genes in plant development and responses to abiotic stresses and hormones.


Assuntos
Alimentos , Piper nigrum , Humanos , Genes Homeobox , Estresse Fisiológico/genética , Fatores de Transcrição/genética , Flores/genética
9.
Genes (Basel) ; 14(9)2023 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-37761928

RESUMO

An in-house tomato inbred line, YNAU335, was planted in a greenhouse in spring from 2014 to 2017, and showed immunity to tomato spotted wilt virus (TSWV). YNAU335 was infected with TSWV in the spring from 2018 to 2020, and disease was observed on the leaves, sepals, and fruits. In 2021 and 2022, YNAU335 was planted in spring in the same greenhouse, which was suspected of being infected with TSWV, and visible disease symptoms were observed on the fruits. Transmission electron microscopy, deep sequencing of small RNAs, and molecular mutation diagnosis were used to analyze the pathological features and genetic polymorphism of TSWV infecting tomato fruit. Typical TSWV virions were observed in the infected fruits, but not leaves from YNAU335 grown between 2021 and 2022, and cross-infection was very rarely observed. The number of mitochondria and chloroplasts increased, but the damage to the mitochondria was greater than that seen in the chloroplasts. Small RNA deep sequencing revealed the presence of multiple viral species in TSWV-infected and non-infected tomato samples grown between 2014-2022. Many virus species, including TSWV, which accounted for the largest proportion, were detected in the TSWV-infected tomato leaves and fruit. However, a variety of viruses other than TSWV were also detected in the non-infected tissues. The amino acids of TSWV nucleocapsid proteins (NPs) and movement proteins (MPs) from diseased fruits of YNAU335 picked in 2021-2022 were found to be very diverse. Compared with previously identified NPs and MPs from TSWV isolates, those found in this study could be divided into three types: non-resistance-breaking, resistance-breaking, and other isolates. The number of positive clones and a comparison with previously identified amino acid mutations suggested that mutation F at AA118 of the MP (GenBank OL310707) is likely the key to breaking the resistance to TSWV, and this mutation developed only in the infected fruit of YNAU335 grown in 2021 and 2022.

10.
Genes (Basel) ; 14(8)2023 08 14.
Artigo em Inglês | MEDLINE | ID: mdl-37628673

RESUMO

Although thaumatin-like proteins (TLPs) are involved in resistance to a variety of fungal diseases, whether the TLP5 and TLP6 genes in tomato plants (Solanum lycopersicum) confer resistance to the pathogenesis of soil-borne diseases has not been demonstrated. In this study, five soil-borne diseases (fungal pathogens: Fusarium solani, Fusarium oxysporum, and Verticillium dahliae; bacterial pathogens: Clavibacter michiganense subsp. michiganense and Ralstonia solanacearum) were used to infect susceptible "No. 5" and disease-resistant "S-55" tomato cultivars. We found that SlTLP5 and SlTLP6 transcript levels were higher in susceptible cultivars treated with the three fungal pathogens than in those treated with the two bacterial pathogens and that transcript levels varied depending on the pathogen. Moreover, the SlTLP5 and SlTLP6 transcript levels were much higher in disease-resistant cultivars than in disease-susceptible cultivars, and the SlTLP5 and SlTLP6 transcript levels were higher in cultivars treated with the same fungal pathogen than in those treated with bacterial pathogens. SlTLP6 transcript levels were higher than SlTLP5. SlTLP5 and SlTLP6 overexpression and gene-edited transgenic mutants were generated in both susceptible and resistant cultivars. Overexpression and knockout increased and decreased resistance to the five diseases, respectively. Transgenic plants overexpressing SlTLP5 and SlTLP6 inhibited the activities of peroxidase (POD), superoxide dismutase (SOD), ascorbate peroxidase (APX), and catalase (CAT) after inoculation with fungal pathogens, and the activities of POD, SOD, and APX were similar to those of fungi after infection with bacterial pathogens. The activities of CAT were increased, and the activity of ß-1,3-glucanase was increased in both the fungal and bacterial treatments. Overexpressed plants were more resistant than the control plants. After SlTLP5 and SlTLP6 knockout plants were inoculated, POD, SOD, and APX had no significant changes, but CAT activity increased and decreased significantly after the fungal and bacterial treatments, contrary to overexpression. The activity of ß-1,3-glucanase decreased in the treatment of the five pathogens, and the knocked-out plants were more susceptible to disease than the control. In summary, this study contributes to the further understanding of TLP disease resistance mechanisms in tomato plants.


Assuntos
Solanum lycopersicum , Solanum lycopersicum/genética , Peroxidase , Superóxido Dismutase , Peroxidases , Ascorbato Peroxidases
11.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37369035

RESUMO

MOTIVATION: In recent years, high-throughput sequencing technologies have made large-scale protein sequences accessible. However, their functional annotations usually rely on low-throughput and pricey experimental studies. Computational prediction models offer a promising alternative to accelerate this process. Graph neural networks have shown significant progress in protein research, but capturing long-distance structural correlations and identifying key residues in protein graphs remains challenging. RESULTS: In the present study, we propose a novel deep learning model named Hierarchical graph transformEr with contrAstive Learning (HEAL) for protein function prediction. The core feature of HEAL is its ability to capture structural semantics using a hierarchical graph Transformer, which introduces a range of super-nodes mimicking functional motifs to interact with nodes in the protein graph. These semantic-aware super-node embeddings are then aggregated with varying emphasis to produce a graph representation. To optimize the network, we utilized graph contrastive learning as a regularization technique to maximize the similarity between different views of the graph representation. Evaluation of the PDBch test set shows that HEAL-PDB, trained on fewer data, achieves comparable performance to the recent state-of-the-art methods, such as DeepFRI. Moreover, HEAL, with the added benefit of unresolved protein structures predicted by AlphaFold2, outperforms DeepFRI by a significant margin on Fmax, AUPR, and Smin metrics on PDBch test set. Additionally, when there are no experimentally resolved structures available for the proteins of interest, HEAL can still achieve better performance on AFch test set than DeepFRI and DeepGOPlus by taking advantage of AlphaFold2 predicted structures. Finally, HEAL is capable of finding functional sites through class activation mapping. AVAILABILITY AND IMPLEMENTATION: Implementations of our HEAL can be found at https://github.com/ZhonghuiGu/HEAL.


Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Sequência de Aminoácidos , Redes Neurais de Computação , Semântica
12.
Adv Healthc Mater ; 12(25): e2300563, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37377126

RESUMO

The persistent infection of high-risk-human papillomavirus type 16 (HPV16) is considered an essential element for suffering cervical cancer. Despite polymerase chain reaction, loop-mediated amplification, and microfluidic chips are used to detect the HPV16, these methods still exist some drawbacks including time-consuming and false positive results. The CRISPR-Cas system is widely used in the region of biological detection due to its precise targeted recognition capability. In this contribution, the novel solution-gated graphene transistor sensor is designed to realize the unamplified and label-free detection of HPV16 DNA. Using the precise recognition of the CRISPR-Cas12a system and the gate functionalization, HPV16 DNA can be precisely identified without need the amplification and labeling. The limit of detection of the sensor can be up to 8.3 × 10-18  m and the detection can be within 20 min. Additionally, the heat-Inactivated clinical samples can be clearly distinguished by the sensor the diagnosis results have a high degree of agreement with q-PCR detection.


Assuntos
Sistemas CRISPR-Cas , Grafite , Humanos , Papillomavirus Humano 16/genética , DNA/genética , Técnicas de Amplificação de Ácido Nucleico
13.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36869836

RESUMO

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Simulação por Computador , Análise por Conglomerados , Análise de Sequência de RNA/métodos
14.
Elife ; 122023 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-36799896

RESUMO

Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.


Assuntos
Proteínas , Proteínas/metabolismo , Sítio Alostérico , Regulação Alostérica/genética , Domínio Catalítico
15.
Adv Sci (Weinh) ; 10(4): e2205886, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36480308

RESUMO

The incidence of prostate cancer (PCa) in men globally increases as the standard of living improves. Blood serum biomarker prostate-specific antigen (PSA) detection is the gold standard assay that do not meet the requirements of early detection. Herein, a solution-gated graphene transistor (SGGT) biosensor for the ultrasensitive and rapid quantification detection of the early prostate cancer-relevant biomarker, miRNA-21 is reported. The designed single-stranded DNA (ssDNA) probes immobilized on the Au gate can hybridize effectively with the miRNA-21 molecules targets and induce the Dirac voltage shifts of SGGT transfer curves. The limit of detection (LOD) of the sensor can reach 10-20  M without amplification and any chemical or biological labeling. The detection linear range is from 10-20 to 10-12  M. The sensor can realize real-time detection of the miRNA-21 molecules in less than 5 min and can well distinguish one-mismatched miRNA-21 molecule. The blood serum samples from the patients without RNA extraction and amplification are measured. The results demonstrated that the biosensor can well distinguish the cancer patients from the control group and has higher sensitivity (100%) than PSA detection (58.3%). Contrastingly, it can be found that the PSA level is not directly related to PCa.


Assuntos
Grafite , MicroRNAs , Neoplasias da Próstata , Masculino , Humanos , Antígeno Prostático Específico/genética , Grafite/química , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Biomarcadores Tumorais/genética , DNA de Cadeia Simples , MicroRNAs/genética
16.
Protein Sci ; 32(2): e4555, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36564866

RESUMO

The development of efficient computational methods for drug target protein identification can compensate for the high cost of experiments and is therefore of great significance for drug development. However, existing structure-based drug target protein-identification algorithms are limited by the insufficient number of proteins with experimentally resolved structures. Moreover, sequence-based algorithms cannot effectively extract information from protein sequences and thus display insufficient accuracy. Here, we combined the sequence-based self-supervised pretraining protein language model ESM1b with a graph convolutional neural network classifier to develop an improved, sequence-based drug target protein identification method. This complete model, named QuoteTarget, efficiently encodes proteins based on sequence information alone and achieves an accuracy of 95% with the nonredundant drug target and nondrug target datasets constructed for this study. When applied to all proteins from Homo sapiens, QuoteTarget identified 1213 potential undeveloped drug target proteins. We further inferred residue-binding weights from the well-trained network using the gradient-weighted class activation mapping (Grad-Cam) algorithm. Notably, we found that without any binding site information input, significant residues inferred by the model closely match the experimentally confirmed drug molecule-binding sites. Thus, our work provides a highly effective sequence-based identifier for drug target proteins, as well to yield new insights into recognizing drug molecule-binding sites. The entire model is available at https://github.com/Chenjxjx/drug-target-prediction.


Assuntos
Redes Neurais de Computação , Proteínas , Humanos , Proteínas/química , Algoritmos , Sítios de Ligação , Sequência de Aminoácidos
17.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36383167

RESUMO

MOTIVATION: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Confiabilidade dos Dados , Multiômica , Análise por Conglomerados
18.
Front Genet ; 13: 977968, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36072672

RESUMO

Single-cell multiomics sequencing techniques have rapidly developed in the past few years. Among these techniques, single-cell cellular indexing of transcriptomes and epitopes (CITE-seq) allows simultaneous quantification of gene expression and surface proteins. Clustering CITE-seq data have the great potential of providing us with a more comprehensive and in-depth view of cell states and interactions. However, CITE-seq data inherit the properties of scRNA-seq data, being noisy, large-dimensional, and highly sparse. Moreover, representations of RNA and surface protein are sometimes with low correlation and contribute divergently to the clustering object. To overcome these obstacles and find a combined representation well suited for clustering, we proposed scCTClust for multiomics data, especially CITE-seq data, and clustering analysis. Two omics-specific neural networks are introduced to extract cluster information from omics data. A deep canonical correlation method is adopted to find the maximumly correlated representations of two omics. A novel decentralized clustering method is utilized over the linear combination of latent representations of two omics. The fusion weights which can account for contributions of omics to clustering are adaptively updated during training. Extensive experiments over both simulated and real CITE-seq data sets demonstrated the power of scCTClust. We also applied scCTClust on transcriptome-epigenome data to illustrate its potential for generalizing.

19.
Artigo em Inglês | MEDLINE | ID: mdl-35675236

RESUMO

This article studies self-supervised graph representation learning, which is critical to various tasks, such as protein property prediction. Existing methods typically aggregate representations of each individual node as graph representations, but fail to comprehensively explore local substructures (i.e., motifs and subgraphs), which also play important roles in many graph mining tasks. In this article, we propose a self-supervised graph representation learning framework named cluster-enhanced Contrast (CLEAR) that models the structural semantics of a graph from graph-level and substructure-level granularities, i.e., global semantics and local semantics, respectively. Specifically, we use graph-level augmentation strategies followed by a graph neural network-based encoder to explore global semantics. As for local semantics, we first use graph clustering techniques to partition each whole graph into several subgraphs while preserving as much semantic information as possible. We further employ a self-attention interaction module to aggregate the semantics of all subgraphs into a local-view graph representation. Moreover, we integrate both global semantics and local semantics into a multiview graph contrastive learning framework, enhancing the semantic-discriminative ability of graph representations. Extensive experiments on various real-world benchmarks demonstrate the efficacy of the proposed over current graph self-supervised representation learning approaches on both graph classification and transfer learning tasks.

20.
Brief Funct Genomics ; 21(4): 325-338, 2022 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-35760070

RESUMO

Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method's potential for application in identifying driver gene candidates for further biological experimental verification.


Assuntos
Neoplasias , Oncogenes , Algoritmos , Redes Reguladoras de Genes , Humanos , Neoplasias/genética , Neoplasias/patologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...